Wood County
Chiefs heiress Gracie Hunt & her fiancé engage in rather interesting MAHA workout, AAU price reactions & MEAT
Taylor Sheridan's new war movie gets major update, legendary director attached LPGA star Nelly Korda sizzles on the beach, Dems won't stop dancing & Gia Duddy whips up a bikini lunch Paige Spiranac provides an update on'Great Cans' saga, fan's still MIA but others have picked up the slack Ivanka Trump has the angry libs on high alert as she slides into an amazing dress, Waffle House chaos & MEAT! Donald Trump makes odd'hair' comment to Danica Patrick at TPUSA event Islamabad enters'red zone' lockdown ahead of expected US-Iran peace talks Holocaust survivor known as'Crossing Guard Diva' goes viral for glam style House Ethics Committee weighs action against Rep. Cherfilus-McCormick'Sinister' links suspected in mysterious deaths of scientists Welcome to the numerous new Screencaps readers - trust me, you have to give this column two weeks to understand what's going on If you are one of the hundreds of thousands of new Screencaps readers who found this column on Monday, welcome back. You're about to become hooked. Just go ahead and clear your daily schedule at 9 a.m. for America's Best Daily Column, as named by the readers who've been with me for years. In some cases, readers have been with me for over a decade. This column is their talk radio.
Minimax optimal differentially private synthetic data for smooth queries
Ding, Rundong, He, Yiyun, Zhu, Yizhe
Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful downstream analysis. Many existing methods ensure uniform accuracy over broad query classes, such as all Lipschitz functions, but this level of generality often leads to suboptimal rates for statistics of practical interest. Since many common data analysis queries exhibit smoothness beyond what worst-case Lipschitz bounds capture, we ask whether exploiting this additional structure can yield improved utility. We study the problem of generating $(\varepsilon,δ)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube $[-1,1]^d$, with utility guarantees uniformly for all smooth queries having bounded derivatives up to order $k$. We propose a polynomial-time algorithm that achieves a minimax error rate of $n^{-\min \{1, \frac{k}{d}\}}$, up to a $\log(n)$ factor. This characterization uncovers a phase transition at $k=d$. Our results generalize the Chebyshev moment matching framework of (Musco et al., 2025; Wang et al., 2016) and strictly improve the error rates for $k$-smooth queries established in (Wang et al., 2016). Moreover, we establish the first minimax lower bound for the utility of $(\varepsilon,δ)$-differentially private synthetic data with respect to $k$-smooth queries, extending the Wasserstein lower bound for $\varepsilon$-differential privacy in (Boedihardjo et al., 2024).
SCALAR: A Part-of-speech Tagger for Identifiers
Newman, Christian D., Scholten, Brandon, Testa, Sophia, Behler, Joshua A. C., Banabilah, Syreen, Collard, Michael L., Decker, Michael J., Mkaouer, Mohamed Wiem, Zampieri, Marcos, AlOmar, Eman Abdullah, Alsuhaibani, Reem, Peruma, Anthony, Maletic, Jonathan I.
--The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github 1 Index T erms --Program comprehension, identifier naming, part-of-speech tagging, natural language processing, software maintenance, software evolution I. I NTRODUCTION The identifiers developers create represent a significant amount of the information other developers must use to understand related code. Given that identifiers represent, on average, 70% of the characters in a code base [1], and developers spend more time reading code than writing [2], [3], it is important for researchers to better understand of how identifiers convey information, and how they can be improved to increase developer reading efficiency.
Geospatial Data Fusion: Combining Lidar, SAR, and Optical Imagery with AI for Enhanced Urban Mapping
Afroosheh, Sajjad, Askari, Mohammadreza
This study explores the integration of Lidar, Synthetic Aperture Radar (SAR), and optical imagery through advanced artificial intelligence techniques for enhanced urban mapping. By fusing these diverse geospatial datasets, we aim to overcome the limitations associated with single-sensor data, achieving a more comprehensive representation of urban environments. The research employs Fully Convolutional Networks (FCNs) as the primary deep learning model for urban feature extraction, enabling precise pixel-wise classification of essential urban elements, including buildings, roads, and vegetation. To optimize the performance of the FCN model, we utilize Particle Swarm Optimization (PSO) for hyperparameter tuning, significantly enhancing model accuracy. Key findings indicate that the FCN-PSO model achieved a pixel accuracy of 92.3% and a mean Intersection over Union (IoU) of 87.6%, surpassing traditional single-sensor approaches. These results underscore the potential of fused geospatial data and AI-driven methodologies in urban mapping, providing valuable insights for urban planning and management. The implications of this research pave the way for future developments in real-time mapping and adaptive urban infrastructure planning.
Fusion of Deep Learning and GIS for Advanced Remote Sensing Image Analysis
Afroosheh, Sajjad, Askari, Mohammadreza
This paper presents an innovative framework for remote sensing image analysis by fusing deep learning techniques, specifically Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, with Geographic Information Systems (GIS). The primary objective is to enhance the accuracy and efficiency of spatial data analysis by overcoming challenges associated with high dimensionality, complex patterns, and temporal data processing. We implemented optimization algorithms, namely Particle Swarm Optimization (PSO) and Genetic Algorithms (GA), to fine-tune model parameters, resulting in improved performance metrics. Our findings reveal a significant increase in classification accuracy from 78% to 92% and a reduction in prediction error from 12% to 6% after optimization. Additionally, the temporal accuracy of the models improved from 75% to 88%, showcasing the frameworks capability to monitor dynamic changes effectively. The integration of GIS not only enriched the spatial analysis but also facilitated a deeper understanding of the relationships between geographical features. This research demonstrates that combining advanced deep learning methods with GIS and optimization strategies can significantly advance remote sensing applications, paving the way for future developments in environmental monitoring, urban planning, and resource management.
Letters from Our Readers
This would be a rather stunning argument from a Black writer such as myself. I am in fact ecstatic that McDonald is playing the role; my argument was that, given the show's sociohistorical specificity, McDonald should not play the role as a specifically Black character, just as she has not done in the past when portraying (always to perfection) other white characters. I read Stephania Taladrid's account of Tony Ogburn's efforts to treat women in Texas after the passage of S.B. 8 with my heart in my mouth ("The Texas Exodus," December 2nd). I was especially horrified to hear about the difficulty of treating ectopic pregnancies; as Ogburn says, "it's the standard of care everywhere in the world." An ectopic pregnancy is an inherently nonviable one.
Addressing Small and Imbalanced Medical Image Datasets Using Generative Models: A Comparative Study of DDPM and PGGANs with Random and Greedy K Sampling
Khazrak, Iman, Takhirova, Shakhnoza, Rezaee, Mostafa M., Yadollahi, Mehrdad, Green, Robert C. II, Niu, Shuteng
The development of accurate medical image classification models is often constrained by privacy concerns and data scarcity for certain conditions, leading to small and imbalanced datasets. To address these limitations, this study explores the use of generative models, such as Denoising Diffusion Probabilistic Models (DDPM) and Progressive Growing Generative Adversarial Networks (PGGANs), for dataset augmentation. The research introduces a framework to assess the impact of synthetic images generated by DDPM and PGGANs on the performance of four models: a custom CNN, Untrained VGG16, Pretrained VGG16, and Pretrained ResNet50. Experiments were conducted using Random Sampling and Greedy K Sampling to create small, imbalanced datasets. The synthetic images were evaluated using Frechet Inception Distance (FID) and compared to original datasets through classification metrics. The results show that DDPM consistently generated more realistic images with lower FID scores and significantly outperformed PGGANs in improving classification metrics across all models and datasets. Incorporating DDPM-generated images into the original datasets increased accuracy by up to 6%, enhancing model robustness and stability, particularly in imbalanced scenarios. Random Sampling demonstrated superior stability, while Greedy K Sampling offered diversity at the cost of higher FID scores. This study highlights the efficacy of DDPM in augmenting small, imbalanced medical image datasets, improving model performance by balancing the dataset and expanding its size.
HGTDP-DTA: Hybrid Graph-Transformer with Dynamic Prompt for Drug-Target Binding Affinity Prediction
Xiao, Xi, Wang, Wentao, Xie, Jiacheng, Zhu, Lijing, Chen, Gaofei, Li, Zhengji, Wang, Tianyang, Xu, Min
Drug target binding affinity (DTA) is a key criterion for drug screening. Existing experimental methods are time-consuming and rely on limited structural and domain information. While learning-based methods can model sequence and structural information, they struggle to integrate contextual data and often lack comprehensive modeling of drug-target interactions. In this study, we propose a novel DTA prediction method, termed HGTDP-DTA, which utilizes dynamic prompts within a hybrid Graph-Transformer framework. Our method generates context-specific prompts for each drug-target pair, enhancing the model's ability to capture unique interactions. The introduction of prompt tuning further optimizes the prediction process by filtering out irrelevant noise and emphasizing task-relevant information, dynamically adjusting the input features of the molecular graph. The proposed hybrid Graph-Transformer architecture combines structural information from Graph Convolutional Networks (GCNs) with sequence information captured by Transformers, facilitating the interaction between global and local information. Additionally, we adopted the multi-view feature fusion method to project molecular graph views and affinity subgraph views into a common feature space, effectively combining structural and contextual information. Experiments on two widely used public datasets, Davis and KIBA, show that HGTDP-DTA outperforms state-of-the-art DTA prediction methods in both prediction performance and generalization ability.
Omnipredictors for Regression and the Approximate Rank of Convex Functions
Gopalan, Parikshit, Okoroafor, Princewill, Raghavendra, Prasad, Shetty, Abhishek, Singhal, Mihir
Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of [GKR+21] that introduced the notion, there has been a large body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$, but much less is known about the regression setting where $\mathbf y \in [0,1]$ can be continuous. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions. Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the $\epsilon$-approximate rank of convex, Lipschitz functions on the interval $[0,1]$, which we show is tight up to a factor of $\mathrm{polylog} (1/\epsilon)$. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class $\mathcal C$. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by [GKH+23] for Boolean labels to the regression setting.
Explaining the Power of Topological Data Analysis in Graph Machine Learning
Taiwo, Funmilola Mary, Islambekov, Umar, Akcora, Cuneyt Gurcan
Topological Data Analysis (TDA) has been praised by researchers for its ability to capture intricate shapes and structures within data. TDA is considered robust in handling noisy and high-dimensional datasets, and its interpretability is believed to promote an intuitive understanding of model behavior. However, claims regarding the power and usefulness of TDA have only been partially tested in application domains where TDA-based models are compared to other graph machine learning approaches, such as graph neural networks. We meticulously test claims on TDA through a comprehensive set of experiments and validate their merits. Our results affirm TDA's robustness against outliers and its interpretability, aligning with proponents' arguments. However, we find that TDA does not significantly enhance the predictive power of existing methods in our specific experiments, while incurring significant computational costs. We investigate phenomena related to graph characteristics, such as small diameters and high clustering coefficients, to mitigate the computational expenses of TDA computations. Our results offer valuable perspectives on integrating TDA into graph machine learning tasks.